E-mail Classification by Decision Forests

نویسندگان

  • Irena Koprinska
  • Felix Trieu
  • Josiah Poon
  • James Clark
چکیده

We investigate the use of decision forests for automated e-mail filing into folders and junk e-mail filtering. The experiments show that decision forests offer the following advantages: (i) ability to deal with the large dimensionality of feature vectors in text categorization, (ii) improved accuracy of the ensemble over the single decision trees and favourable comparison with a number of other highly accurate classifiers including neural networks and boosted decision trees, and (iii) acceptable computational expenses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voting-based Classification for E-mail Spam Detection

The problem of spam e-mail has gained a tremendous amount of attention. Although entities tend to use e-mail spam filter applications to filter out received spam e-mails, marketing companies still tend to send unsolicited emails in bulk and users still receive a reasonable amount of spam e-mail despite those filtering applications. This work proposes a new method for classifying emails into spa...

متن کامل

Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees

The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investig...

متن کامل

THE FLORA OF THREATENED BLACK ALDER FORESTS IN THE CASPIAN LOWLANDS, NORTHERN IRAN

The Caspian (Hyrcanian) lowland forest zone in northern Iran is characterized by small remnant alder forest communities, dominated or subdominated with an Euxino-Hyrcanian element, Alnus glutinosa ssp. barbata. The first floristic inventory of these alder forests in northern Iran is presented. The floristic catalogue is based on the data of 133 phytosociological releves in eight different alder...

متن کامل

Population variation of Artemisia sieberi in Iran based on quantitative characters of leaf and seed and their relationships with habitat features

Thirty-four populations of Artemisia sieberi from 10 provinces of Iran were investigated with respect to quantitative characteristics of leaves and seeds. In each habitat, five plants were randomly selected and some branches were harvested for studying leaf characteristics in spring and seed characteristic in autumn. Principle features of climate and soil were studied in each habitat. In order ...

متن کامل

Learning to classify e-mail

In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003